Many-core algorithms for statistical phylogenetics

نویسندگان

  • Marc A. Suchard
  • Andrew Rambaut
چکیده

MOTIVATION Statistical phylogenetics is computationally intensive, resulting in considerable attention meted on techniques for parallelization. Codon-based models allow for independent rates of synonymous and replacement substitutions and have the potential to more adequately model the process of protein-coding sequence evolution with a resulting increase in phylogenetic accuracy. Unfortunately, due to the high number of codon states, computational burden has largely thwarted phylogenetic reconstruction under codon models, particularly at the genomic-scale. Here, we describe novel algorithms and methods for evaluating phylogenies under arbitrary molecular evolutionary models on graphics processing units (GPUs), making use of the large number of processing cores to efficiently parallelize calculations even for large state-size models. RESULTS We implement the approach in an existing Bayesian framework and apply the algorithms to estimating the phylogeny of 62 complete mitochondrial genomes of carnivores under a 60-state codon model. We see a near 90-fold speed increase over an optimized CPU-based computation and a >140-fold increase over the currently available implementation, making this the first practical use of codon models for phylogenetic inference over whole mitochondrial or microorganism genomes. AVAILABILITY AND IMPLEMENTATION Source code provided in BEAGLE: Broad-platform Evolutionary Analysis General Likelihood Evaluator, a cross-platform/processor library for phylogenetic likelihood computation (http://beagle-lib.googlecode.com/). We employ a BEAGLE-implementation using the Bayesian phylogenetics framework BEAST (http://beast.bio.ed.ac.uk/).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Methods and Architectures for Realizing Fast Phylogenetic Computation Engines Using VLSI Array Based Logic

Evaluating phylogenetics trees is an endeavor fundamental to comparative genomics and a core discipline of Bioinformatics. However, with single trees taking up to a week on the fastest processor under general models of evolution and the number of trees growing exponentially with the number of sequences analyzed, this is an exceptionally computationally intensive endeavor. There has been much wo...

متن کامل

Efficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems

Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...

متن کامل

Hybrid Genetic Algorithm and Lasso Test Approach for Inferring Well Supported Phylogenetic Trees Based on Subsets of Chloroplastic Core Genes

The amount of completely sequenced chloroplast genomes increases rapidly every day, leading to the possibility to build large scale phylogenetic trees of plant species. Considering a subset of close plant species defined according to their chloroplasts, the phylogenetic tree that can be inferred by their core genes is not necessarily well supported, due to the possible occurrence of “problemati...

متن کامل

"principles of Phylogenetics: Ecology

I. Continuous characters – ancestral states Many traits of interest are measured on continuous or metric scales – size and shape, physiological rates, etc. Continuous traits are often useful for species identification and taxonomic descriptions; historically, they were also used in phylogenetic analysis through the use of clustering algorithms that can group taxa based on multivariate phenetic ...

متن کامل

The Minimum Evolution Distance-based Approach to Phylogenetic Inference

Distance algorithms remain among the most popular for reconstructing phylogenies, especially for researchers faced with data sets with large numbers of taxa. Distance algorithms are much faster in practice than character or likelihood algorithms, and least-squares algorithms produce trees that have several desirable statistical properties. The fast Neighbor Joining heuristic has proven to be qu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 25 11  شماره 

صفحات  -

تاریخ انتشار 2009